산업체 재직자 대상 인공지능 핵심 기초 교육

인공지능 기초


Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. Machine Learning and Deep Learning




1.1. Taxonomy of AI



1.2. Scikit Learn

  • Machine Learning in Python
  • Simple and efficient tools for data mining and data analysis
  • Accessible to everybody, and reusable in various contexts
  • Built on NumPy, SciPy, and matplotlib
  • Open source, commercially usable - BSD license
  • https://scikit-learn.org/stable/index.html




1.3. Supervised Learning

  • Given training set $\left\{ \left(x^{(1)}, y^{(1)}\right), \left(x^{(2)}, y^{(2)}\right),\cdots,\left(x^{(m)}, y^{(m)}\right) \right\}$
  • Want to find a function $f_{\omega}$ with learning parameter, $\omega$
    • $f_{\omega}$ desired to be as close as possible to $y$ for future $(x,y)$
    • $i.e., f_{\omega}(x) \approx y$
  • Define a loss function
$$\ell \left(f_{\omega} \left(x^{(i)}\right), y^{(i)}\right)$$
  • Solve the following optimization problem:
$$ \begin{align*} \text{minimize} &\quad \frac{1}{m} \sum_{i=1}^{m} \ell \left(f_{\omega} \left(x^{(i)}\right), y^{(i)}\right)\\ \text{subject to} &\quad \omega \in \boldsymbol{\omega} \end{align*} $$


  • Function approximation between inputs and outputs


  • Once it is learned,

2. Regression

  • A set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables

2.1. Linear Regression



2.2. Multivariate Linear Regression



2.3. Nonlinear Regression



2.4. Feature Selection

  • Multivariate regression


$$ \hat{y} = \theta_0 + \theta_{1}x_1 + \theta_{2}x_2 + \theta_{3}x_3 + \cdots $$

  • The process of selecting a subset of relevant features (variables, predictors) for use in model construction.
  • Feature selection techniques are used for several reasons:
    • simplification of models to make them easier to interpret,
    • shorter training times,
    • to avoid the curse of dimensionality,
    • improve data's compatibility with a learning model class,
    • encode inherent symmetries present in the input space.

2.5. Correlation Coefficient

  • $+1 \to$ close to a straight line

  • $-1 \to$ close to a straight line

  • Indicate how close to a linear line, but

  • No information on slope

$$0 \leq \left\lvert \text{ correlation coefficient } \right\rvert \leq 1$$$$\hspace{1cm}\begin{array}{Icr}\leftarrow\\ (\text{uncorrelated})\end{array} \quad \quad \quad \begin{array}{Icr}\rightarrow \\ (\text{linearly correlated})\end{array}$$
  • Does not tell anything about causality

2.6. Correlation Coefficient Plot



2.7. Python

2.7.1. Linear Regression

In [1]:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
In [2]:
import warnings
warnings.filterwarnings(action = 'ignore') 
In [3]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# data points in column vector [input, output]
x = np.array([0.1, 0.4, 0.7, 1.2, 1.3, 1.7, 2.2, 2.8, 3.0, 4.0, 4.3, 4.4, 4.9]).reshape(-1, 1)
y = np.array([0.5, 0.9, 1.1, 1.5, 1.5, 2.0, 2.2, 2.8, 2.7, 3.0, 3.5, 3.7, 3.9]).reshape(-1, 1)

# to plot
plt.figure(figsize = (10, 6))
plt.title('Linear Regression', fontsize = 15)
plt.xlabel('X', fontsize = 15)
plt.ylabel('Y', fontsize = 15)
plt.plot(x, y, 'ko', label = "data")
plt.xlim([0, 5])
plt.grid(alpha = 0.3)
plt.axis('scaled')
plt.show()
In [4]:
from sklearn.linear_model import LinearRegression

reg = LinearRegression()
reg.fit(x,y)
Out[4]:
LinearRegression()
In [5]:
print(reg.coef_)       # Coef
print(reg.intercept_)  # Bias
[[0.67129519]]
[0.65306531]
In [6]:
# to plot
plt.figure(figsize = (10, 6))
plt.title('Linear Regression', fontsize = 15)
plt.xlabel('X', fontsize = 15) 
plt.ylabel('Y', fontsize = 15)
plt.plot(x, y, 'ko', label = "data")

# to plot a straight line (fitted line)
xp = np.arange(0, 5, 0.01).reshape(-1, 1)
yp = reg.coef_*xp + reg.intercept_

plt.plot(xp, yp, 'r', linewidth = 2, label = "$L_2$")
plt.legend(fontsize = 15)
plt.axis('scaled')
plt.grid(alpha = 0.3)
plt.xlim([0, 5])
plt.show()

2.7.2. Nonlinear Regression

In [7]:
n = 100            
x = -5 + 15*np.random.rand(n, 1)
noise = 10*np.random.randn(n, 1)
y = 10 + 1*x + 2*x**2 + noise

plt.figure(figsize = (10, 6))
plt.title('Nonlinear Regression', fontsize = 15)
plt.xlabel('X', fontsize = 15)
plt.ylabel('Y', fontsize = 15)
plt.plot(x, y, 'o', markersize = 4, label = 'actual')
plt.xlim([np.min(x), np.max(x)])
plt.grid(alpha = 0.3)
plt.legend(fontsize = 15)
plt.show()
In [8]:
from sklearn.preprocessing import PolynomialFeatures

poly_features = PolynomialFeatures(degree = 2, include_bias = False)
x_poly = poly_features.fit_transform(x)

reg.fit(x_poly, y)
Out[8]:
LinearRegression()
In [9]:
p = reg.predict(x_poly)
In [10]:
plt.figure(figsize = (10, 6))
plt.title('Nonlinear Regression', fontsize = 15)
plt.xlabel('X', fontsize = 15)
plt.ylabel('Y', fontsize = 15)
plt.plot(x, y, 'o', markersize = 4, label = 'actual')
plt.plot(x, p, 'ro', markersize = 4, label = 'predict')
plt.grid(alpha = 0.3)
plt.legend(fontsize = 15)
plt.xlim([np.min(x), np.max(x)])
plt.show()

3. Classification

  • where $y$ is a discrete value
    • develop the classification algorithm to determine which class a new input should fall into
  • To find a classification boundary
  • We will learn
    • Support Vector Machine (SVM)
    • Logistic Regression


3.1. Linear Classification


3.2. Non-linear Classification


3.3. Python

3.3.1. SVM

In [11]:
x1 = 8*np.random.rand(100, 1)
x2 = 7*np.random.rand(100, 1) - 4

g0 = 0.8*x1 + x2 - 3
g1 = g0 - 1
g2 = g0 + 1

C1 = np.where(g1 >= 0)[0]
C2 = np.where(g2 < 0)[0]

X1 = np.hstack([x1[C1],x2[C1]])
X2 = np.hstack([x1[C2],x2[C2]])
n = X1.shape[0]
m = X2.shape[0]
X = np.vstack([X1, X2])
y = np.vstack([np.zeros([n, 1]), np.ones([m, 1])])

plt.figure(figsize = (10, 6))
plt.plot(x1[C1], x2[C1], 'ro', label = 'C1')
plt.plot(x1[C2], x2[C2], 'bo', label = 'C2')
plt.xlabel('$x_1$', fontsize = 20)
plt.ylabel('$x_2$', fontsize = 20)
plt.legend(loc = 4)
plt.xlim([0, 8])
plt.ylim([-4, 3])
plt.show()
In [12]:
from sklearn.svm import SVC

clf = SVC(kernel = 'linear')
clf.fit(X, y)
Out[12]:
SVC(kernel='linear')
In [13]:
print(clf.coef_)
print(clf.intercept_)
[[-0.6977739  -0.86857702]]
[2.596101]
In [14]:
xp = np.linspace(0,8,100).reshape(-1,1)
yp = -clf.coef_[0,0]/clf.coef_[0,1]*xp - clf.intercept_/clf.coef_[0,1]

plt.figure(figsize = (10, 6))
plt.plot(X[0:n, 0], X[0:n, 1], 'ro', label = 'C1')
plt.plot(X[n:-1, 0], X[n:-1, 1], 'bo', label = 'C2')
plt.plot(xp, yp, '--k', label = 'SVM')
plt.xlabel('$x_1$', fontsize = 20)
plt.ylabel('$x_2$', fontsize = 20)
plt.legend(loc = 4)
plt.xlim([0, 8])
plt.ylim([-4, 3])
plt.show()

3.3.2. Logistic Regression

In [15]:
m = 500

X0 = np.random.multivariate_normal([0, 0], np.eye(2), m)
X1 = np.random.multivariate_normal([10, 10], np.eye(2), m)

X = np.vstack([X0, X1])
y = np.vstack([np.zeros([m,1]), np.ones([m,1])])

plt.figure(figsize = (10, 6))
plt.plot(X0[:,0], X0[:,1], '.b', label = 'Class 0')
plt.plot(X1[:,0], X1[:,1], '.k', label = 'Class 1')

plt.title('Data Classes', fontsize = 15)
plt.legend(loc = 'lower right', fontsize = 15)
plt.xlabel('X1', fontsize = 15)
plt.ylabel('X2', fontsize = 15)
plt.xlim([-10,20])
plt.ylim([-4,14])
plt.grid(alpha = 0.3)
plt.show()
In [16]:
from sklearn.linear_model import LogisticRegression

clf = LogisticRegression()
clf.fit(X, y)
Out[16]:
LogisticRegression()
In [17]:
print(clf.coef_)
print(clf.intercept_)
[[0.9270674  0.91347544]]
[-9.20052827]
In [18]:
xp = np.linspace(-10,20,100).reshape(-1,1)
yp = -clf.coef_[0,0]/clf.coef_[0,1]*xp - clf.intercept_/clf.coef_[0,1]

plt.figure(figsize = (10, 6))
plt.plot(X0[:,0], X0[:,1], '.b', label = 'Class 0')
plt.plot(X1[:,0], X1[:,1], '.k', label = 'Class 1')
plt.plot(xp, yp, '--k', label = 'Logistic')
plt.xlim([-10,20])
plt.ylim([-4,14])

plt.title('Data Classes', fontsize = 15)
plt.legend(loc = 'lower right', fontsize = 15)
plt.xlabel('X1', fontsize = 15)
plt.ylabel('X2', fontsize = 15)
plt.grid(alpha = 0.3)
plt.show()
In [19]:
pred = clf.predict_proba([[0,6]])
pred
Out[19]:
array([[0.97633193, 0.02366807]])

4. Steps for Machine Learning

4.1. Model Evaluation

  • Adding more features will always decrease the loss
  • How do we determine when an algorithm achieves “good” performance?


  • A better criterion:
    • Training set (e.g., 70 %)
    • Testing set (e.g., 30 %)
  • Performance on testing set called generalization performance

4.2. Supervised Learning

  • Workflow



  • Workflow in more detail



5. Supervised Learning vs. Unsupervised Learning




6. Clustering

  • Data clustering is an unsupervised learning problem

  • Given:

    • $m$ unlabeled examples $\{x^{(1)},x^{(2)}\cdots, x^{(m)}\}$
    • the number of partitions $k$
  • Goal: group the examples into $k$ partitions


$$\{x^{(1)},x^{(2)},\cdots,x^{(m)}\} \quad \Rightarrow \quad \text{Clustering}$$


6.1. K-means



6.2. Python

In [20]:
m = 200

X0 = np.random.multivariate_normal([-1, 1], np.eye(2), m)
X1 = np.random.multivariate_normal([15, 10], np.eye(2), m)
X2 = np.random.multivariate_normal([0, 6], np.eye(2), m)
X = np.vstack([X0, X1, X2])

plt.figure(figsize = (10, 6))
plt.plot(X[:,0], X[:,1], '.b')

plt.xlim([-10,20])
plt.ylim([-4,14])
plt.grid(alpha = 0.3)
plt.show()
In [21]:
from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters = 3, random_state = 0)
kmeans.fit(X)
Out[21]:
KMeans(n_clusters=3, random_state=0)
In [22]:
print(kmeans.labels_)
[2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0]
In [23]:
plt.figure(figsize = (10,6))

plt.plot(X[kmeans.labels_ == 0,0],X[kmeans.labels_ == 0,1],'g.', label = 0)
plt.plot(X[kmeans.labels_ == 1,0],X[kmeans.labels_ == 1,1],'k.', label = 1)
plt.plot(X[kmeans.labels_ == 2,0],X[kmeans.labels_ == 2,1],'r.', label = 2)

plt.xlim([-10, 20])
plt.ylim([-4, 14])
plt.grid(alpha = 0.3)
plt.legend(loc = 'lower right', fontsize = 15)
plt.show()

7. Dimension Reduction

  • Why dimensionality reduction?
    • insights into the low-dimensinal structures in the data (visualization)
    • Fewer dimensions ⇒ Less chances of overfitting ⇒ Better generalization
    • Speeding up learning algorithms
      • Most algorithms scale badly with increasing data dimensionality
    • Less storage requirements (data compression)
    • Note: Dimensionality Reduction is different from Feature Selection
      • .. although the goals are kind of the same
    • Dimensionality reduction is more like “Feature Extraction
      • Constructing a small set of new features from the original features
  • How?
    • idea: highly correlated data contains redundant features




7.1. Principal Component Analysis (PCA)

  • Each example $x$ has 2 features $\{x_1,x_2\}$

  • Consider ignoring the feature $x_2$ for each example

  • Each 2-dimensional example $x$ now becomes 1-dimensional $x = \{x_1\}$

  • Are we losing much information by throwing away $x_2$ ?

  • No. Most of the data spread is along 𝑥_1 (very little variance along 𝑥_2)





  • Each example $x$ has 2 features $\{x_1,x_2\}$

  • Consider ignoring the feature $x_2$ for each example

  • Each 2-dimensional example $x$ now becomes 1-dimensional $x = \{x_1\}$

  • Are we losing much information by throwing away $x_2$ ?

  • Yes, the data has substantial variance along both features (i.e. both axes)





  • Now consider a change of axes

  • Each example $x$ has 2 features $\{u_1,u_2\}$

  • Consider ignoring the feature $u_2$ for each example

  • Each 2-dimensional example $x$ now becomes 1-dimensional $x = \{u_1\}$

  • Are we losing much information by throwing away $u_2$ ?

  • No. Most of the data spread is along $u_1$ (very little variance along $u_2$)





  • Data $\rightarrow$ projection onto unit vector $\hat{u}_1$
    • PCA is used when we want projections capturing maximum variance directions
    • Principal Components (PC): directions of maximum variability in the data
    • Roughly speaking, PCA does a change of axes that can represent the data in a succinct manner




In [24]:
m = 5000
mu = np.array([0, 0])
sigma = np.array([[3, 1.5], 
                  [1.5, 1]])

X = np.random.multivariate_normal(mu, sigma, m)

fig = plt.figure(figsize = (10, 6))
plt.plot(X[:,0], X[:,1], 'k.')
plt.axis('equal')
plt.show()
In [25]:
from sklearn.decomposition import PCA

pca = PCA(n_components = 2)
pca.fit(X)
Out[25]:
PCA(n_components=2)
In [26]:
plt.figure()
plt.stem(range(1,3),pca.explained_variance_ratio_)

plt.xlim([0.5, 2.5])
plt.ylim([0, 1])
plt.title('Score (%)')
plt.show()
In [27]:
principal_axis = pca.components_[0, :]
h = principal_axis[1]/principal_axis[0]

xp = np.linspace(-6,6,200)
yp = xp.dot(h)

plt.figure(figsize=(10,6))
plt.plot(X[:, 0], X[:, 1],'k.')
plt.plot(xp, yp, 'r.')
plt.axis('equal')
plt.show()

8. Decision Tree

8.1. Decision Tree for Classification







In [28]:
from sklearn import tree
In [29]:
data = np.array([[0, 0, 1, 0, 0],
                [1, 0, 2, 0, 0],
                [0, 1, 2, 0, 1],
                [2, 1, 0, 2, 1],
                [0, 1, 0, 1, 1],
                [1, 1, 1, 2, 0],
                [1, 1, 0, 2, 0],
                [0, 0, 2, 1, 0]])      

x = data[:,0:4]
y = data[:,4]
print(x, '\n')
print(y)
[[0 0 1 0]
 [1 0 2 0]
 [0 1 2 0]
 [2 1 0 2]
 [0 1 0 1]
 [1 1 1 2]
 [1 1 0 2]
 [0 0 2 1]] 

[0 0 1 1 1 0 0 0]
In [30]:
clf = tree.DecisionTreeClassifier(criterion = 'entropy', max_depth = 3, random_state=0)
clf.fit(x,y)
Out[30]:
DecisionTreeClassifier(criterion='entropy', max_depth=3, random_state=0)
In [31]:
# [?, Yes, Low, Medium]
clf.predict([[0, 0, 1, 0]])
Out[31]:
array([0])

8.2. Nonlinear Classification

In [32]:
X1 = np.array([[-1.1,0],[-0.3,0.1],[-0.9,1],[0.8,0.4],[0.4,0.9],[0.3,-0.6],
               [-0.5,0.3],[-0.8,0.6],[-0.5,-0.5]])
     
X0 = np.array([[-1,-1.3], [-1.6,2.2],[0.9,-0.7],[1.6,0.5],[1.8,-1.1],[1.6,1.6],
               [-1.6,-1.7],[-1.4,1.8],[1.6,-0.9],[0,-1.6],[0.3,1.7],[-1.6,0],[-2.1,0.2]])

X1 = np.asmatrix(X1)
X0 = np.asmatrix(X0)

plt.figure(figsize=(10, 8))
plt.plot(X1[:,0], X1[:,1], 'ro', label = 'C1')
plt.plot(X0[:,0], X0[:,1], 'bo', label = 'C0')
plt.title('SVM for Nonlinear Data', fontsize = 15)
plt.xlabel(r'$x_1$', fontsize = 15)
plt.ylabel(r'$x_2$', fontsize = 15)
plt.legend(loc = 1, fontsize = 12)
plt.axis('equal')
plt.show()
In [33]:
N = X1.shape[0]
M = X0.shape[0]

X = np.vstack([X1, X0])
y = np.vstack([np.ones([N,1]), np.zeros([M,1])])
In [34]:
clf = tree.DecisionTreeClassifier(criterion = 'entropy', max_depth = 4, random_state=0)
clf.fit(X,y)
Out[34]:
DecisionTreeClassifier(criterion='entropy', max_depth=4, random_state=0)
In [35]:
clf.predict([[0, 1]])
Out[35]:
array([1.])
In [36]:
# to plot
[X1gr, X2gr] = np.meshgrid(np.arange(-3,3,0.1), np.arange(-3,3,0.1))

Xp = np.hstack([X1gr.reshape(-1,1), X2gr.reshape(-1,1)])
Xp = np.asmatrix(Xp)

q = clf.predict(Xp)
q = np.asmatrix(q).reshape(-1,1)

C1 = np.where(q == 1)[0]

plt.figure(figsize = (10, 8))
plt.plot(X1[:,0], X1[:,1], 'ro', label = 'C1')
plt.plot(X0[:,0], X0[:,1], 'bo', label = 'C0')
plt.plot(Xp[C1,0], Xp[C1,1], 'gs', markersize = 8, alpha = 0.1, label = 'Decison Tree')
plt.xlabel(r'$x_1$', fontsize = 15)
plt.ylabel(r'$x_2$', fontsize = 15)
plt.legend(loc = 1, fontsize = 12)
plt.axis('equal')
plt.show()

8.3. Multiclass Classification

In [37]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## generate three simulated clusters
mu1 = np.array([1, 7])
mu2 = np.array([3, 4])
mu3 = np.array([6, 5])

SIGMA1 = 0.8*np.array([[1, 1.5],
                       [1.5, 3]])
SIGMA2 = 0.5*np.array([[2, 0],
                       [0, 2]])
SIGMA3 = 0.5*np.array([[1, -1],
                       [-1, 2]])

X1 = np.random.multivariate_normal(mu1, SIGMA1, 100)
X2 = np.random.multivariate_normal(mu2, SIGMA2, 100)
X3 = np.random.multivariate_normal(mu3, SIGMA3, 100)

y1 = 1*np.ones([100,1])
y2 = 2*np.ones([100,1])
y3 = 3*np.ones([100,1])

plt.figure(figsize = (10, 8))
plt.title('Generated Data', fontsize = 15)
plt.plot(X1[:,0], X1[:,1], '.', label = 'C1')
plt.plot(X2[:,0], X2[:,1], '.', label = 'C2')
plt.plot(X3[:,0], X3[:,1], '.', label = 'C3')
plt.xlabel('$X_1$', fontsize = 15)
plt.ylabel('$X_2$', fontsize = 15)
plt.legend(fontsize = 12)
plt.axis('equal')
plt.grid(alpha = 0.3)
plt.axis([-2, 10, 0, 12])
plt.show()
In [38]:
X = np.vstack([X1, X2, X3])
y = np.vstack([y1, y2, y3])

clf = tree.DecisionTreeClassifier(criterion = 'entropy', max_depth = 3, random_state = 42)
clf.fit(X,y)
Out[38]:
DecisionTreeClassifier(criterion='entropy', max_depth=3, random_state=42)
In [39]:
res = 0.3
[X1gr, X2gr] = np.meshgrid(np.arange(-2,10,res), np.arange(0,12,res))

Xp = np.hstack([X1gr.reshape(-1,1), X2gr.reshape(-1,1)])
Xp = np.asmatrix(Xp)

q = clf.predict(Xp)
q = np.asmatrix(q).reshape(-1,1)

C1 = np.where(q == 1)[0]
C2 = np.where(q == 2)[0]
C3 = np.where(q == 3)[0]

plt.figure(figsize = (10, 8))
plt.plot(X1[:,0], X1[:,1], '.', label = 'C1')
plt.plot(X2[:,0], X2[:,1], '.', label = 'C2')
plt.plot(X3[:,0], X3[:,1], '.', label = 'C3')
plt.plot(Xp[C1,0], Xp[C1,1], 's', color = 'blue', markersize = 8, alpha = 0.1)
plt.plot(Xp[C2,0], Xp[C2,1], 's', color = 'orange', markersize = 8, alpha = 0.1)
plt.plot(Xp[C3,0], Xp[C3,1], 's', color = 'green', markersize = 8, alpha = 0.1)
plt.xlabel('$X_1$', fontsize = 15)
plt.ylabel('$X_2$', fontsize = 15)
plt.legend(fontsize = 12)
plt.axis('equal')
plt.grid(alpha = 0.3)
plt.axis([-2, 10, 0, 12])
plt.show()

8.4. Decision Tree for Regression

  • Decision tree regression is when the predicted outcome can be considered a real number.





9. Ensemble

9.1. Ensemble Learning

  • Ensembles: collections of predictors
    • Combine predictions to improve performance
  • Ensemble with different models
  • Assume combined models make better prediction than one good model
    • Improve overfitting and accuracy





  • Ensemble with different train datasets



  • To test, run each trained model
    • For regression, each regressor predicts, take average
    • Each classifier votes on the output, take majority



9.2. Random Forest

  • Ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time
  • Random Forest for Classification



  • Random Forest for Regression




9.3. More Advanced Ensemble Algorithms

  • LightGBM (Light Gradient-Boosting Machine)
  • XGBoost (Extreme Gradient Boost)
  • CatBoost
  • ...
  • We do not need to understand how they work, but we will use them.

9.4. K-Fold Cross-Validation

  • Useful especially for a small data set
  • Advantages of cross validation
    • It can prevent models from overfitting
    • Cross validation helps in finding the optimal value of hyperparameters to increase the efficiency of the algorithm.




9.5. Python

In [40]:
from sklearn import ensemble

clf = ensemble.RandomForestClassifier(n_estimators = 100, max_depth = 3, random_state = 0)
clf.fit(X, y)
Out[40]:
RandomForestClassifier(max_depth=3, random_state=0)
In [41]:
res = 0.3
[X1gr, X2gr] = np.meshgrid(np.arange(-2,10,res), np.arange(0,12,res))

Xp = np.hstack([X1gr.reshape(-1,1), X2gr.reshape(-1,1)])
Xp = np.asmatrix(Xp)

q = clf.predict(Xp)
q = np.asmatrix(q).reshape(-1,1)

C1 = np.where(q == 1)[0]
C2 = np.where(q == 2)[0]
C3 = np.where(q == 3)[0]

plt.figure(figsize = (10, 8))
plt.plot(X1[:,0], X1[:,1], '.', label = 'C1')
plt.plot(X2[:,0], X2[:,1], '.', label = 'C2')
plt.plot(X3[:,0], X3[:,1], '.', label = 'C3')
plt.plot(Xp[C1,0], Xp[C1,1], 's', color = 'blue', markersize = 8, alpha = 0.1)
plt.plot(Xp[C2,0], Xp[C2,1], 's', color = 'orange', markersize = 8, alpha = 0.1)
plt.plot(Xp[C3,0], Xp[C3,1], 's', color = 'green', markersize = 8, alpha = 0.1)
plt.xlabel('$X_1$', fontsize = 15)
plt.ylabel('$X_2$', fontsize = 15)
plt.legend(fontsize = 12)
plt.axis('equal')
plt.grid(alpha = 0.3)
plt.axis([-2, 10, 0, 12])
plt.show()

10. Artificial Neural Networks (ANN)

  • Complex/Nonlinear universal function approximator
    • Linearly connected networks
    • Simple nonlinear neurons
  • Hidden layers
    • Autonomous feature learning




10.1. Machine Learning vs. Deep Learning

  • Machine learning




  • Deep supervised learning






10.2. ANN in Python





In [42]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
In [43]:
mnist = tf.keras.datasets.mnist

(train_x, train_y), (test_x, test_y) = mnist.load_data()

train_x, test_x = train_x/255.0, test_x/255.0
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] - 0s 0us/step
In [44]:
img = train_x[5].reshape(28,28)

plt.figure(figsize = (6,6))
plt.imshow(img, 'gray')
plt.xticks([])
plt.yticks([])
plt.show()
In [45]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape = (28, 28)),
    tf.keras.layers.Dense(units = 100, activation = 'relu'),
    tf.keras.layers.Dense(units = 10, activation = 'softmax')
])
In [46]:
model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics = ['accuracy'])
In [47]:
# Train Model

loss = model.fit(train_x, train_y, epochs = 5)
Epoch 1/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.2720 - accuracy: 0.9221
Epoch 2/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.1231 - accuracy: 0.9642
Epoch 3/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0858 - accuracy: 0.9744
Epoch 4/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0653 - accuracy: 0.9794
Epoch 5/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0509 - accuracy: 0.9843
In [48]:
# Evaluate Test Data

test_loss, test_acc = model.evaluate(test_x, test_y)
313/313 [==============================] - 1s 2ms/step - loss: 0.0847 - accuracy: 0.9748
In [49]:
test_img = test_x[np.random.choice(test_x.shape[0], 1)]

predict = model.predict_on_batch(test_img)
mypred = np.argmax(predict, axis = 1)

plt.figure(figsize = (12,5))

plt.subplot(1,2,1)
plt.imshow(test_img.reshape(28, 28), 'gray')
plt.axis('off')
plt.subplot(1,2,2)
plt.stem(predict[0])
plt.show()

print('Prediction : {}'.format(mypred[0]))
Prediction : 6
In [50]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')